{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pandas\n",
    "Pandas库基于Numpy库，提供了很多用于数据操作与分析的功能。\n",
    "\n",
    "## 安装与使用\n",
    "安装：  \n",
    "`pip install pandas`  \n",
    "根据惯例，我们使用如下的方式引入pandas：  \n",
    "`import pandas as pd`\n",
    "\n",
    "## 两个常用数据类型\n",
    "pandas提供两个常用的数据类型：\n",
    "* Series\n",
    "* DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Series类型\n",
    "Series类型类似于Numpy的一维数组对象，可以将该类型看做是一组数据与数据相关的标签（索引）联合而构成（带有标签的一维数组对象）。 \n",
    "## 创建方式\n",
    "Series常用的创建（初始化）方式：\n",
    "* 列表等可迭代对象\n",
    "* ndarray数组对象\n",
    "* 字典对象\n",
    "* 标量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1212\n",
       "1       2\n",
       "2       3\n",
       "3       4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 创建Series：使用列表\n",
    "s = pd.Series([1212, 2, 3, 4])\n",
    "display(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    1\n",
       "2    2\n",
       "3    3\n",
       "4    4\n",
       "5    5\n",
       "6    6\n",
       "7    7\n",
       "8    8\n",
       "9    9\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 创建Series：使用可迭代对象\n",
    "s = pd.Series(range(10))\n",
    "display(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "3    4\n",
       "dtype: int32"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 创建Series：使用ndarray数组\n",
    "s = pd.Series(np.array([1, 2, 3, 4]))\n",
    "display(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a       xy\n",
       "b    34234\n",
       "c     3243\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 创建Series：使用字典。字典的key充当标签，字典的value充当Series的值。\n",
    "s = pd.Series({\"a\":\"xy\", \"b\":\"34234\", \"c\":\"3243\"})\n",
    "display(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    33\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 创建Series：标量\n",
    "s = pd.Series(33)\n",
    "display(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "k    33\n",
       "x    33\n",
       "y    33\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 在创建Series时，可以使用index参数来显式指定索引。如果没有显式指定，则默认从0开始进行排列。\n",
    "s = pd.Series(33, index=[\"k\", \"x\", \"y\"])\n",
    "display(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 相关属性\n",
    "* index\n",
    "* values\n",
    "* shape\n",
    "* size\n",
    "* dtype\n",
    "\n",
    "Series对象可以通过index与values访问索引与值。其中，我们也可以通过修改index属性来修改Series的索引。\n",
    "\n",
    "说明：\n",
    "* 如果没有指定索引，则会自动生成从0开始的整数值索引，也可以使用index显式指定索引。  \n",
    "* Series对象与index具有name属性。Series的name属性可在创建时通过name参数指定。\n",
    "* 当数值较多时，可以通过head与tail访问前 / 后N个数据。【中间的怎么办？】\n",
    "* Series对象的数据只能是一维数组类型。\n",
    "\n",
    "## <font color=\"green\">Series也可以通过索引进行访问数据，与Numpy的ndarray数组对象索引是否存在不同？</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['a', 'b', 'c', 'd'], dtype='object')"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "array([1, 2, 3, 4], dtype=int64)"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "numpy.ndarray"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "(4,)"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "4"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "dtype('int64')"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s = pd.Series([1, 2, 3, 4], index=list(\"abcd\"))\n",
    "# 返回Series的索引对象。\n",
    "display(s.index)\n",
    "# 返回Series所关联的数组数据。naarray类型。\n",
    "display(s.values, type(s.values))\n",
    "# 返回Series对象的形状\n",
    "display(s.shape)\n",
    "# 返回元素的个数\n",
    "display(s.size)\n",
    "# 返回元素的类型\n",
    "display(s.dtype)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index Name\n",
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "3    4\n",
       "Name: Series name, dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "Name: Series name, dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "index name\n",
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "d    4\n",
       "Name: Series name, dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s = pd.Series([1, 2, 3, 4])\n",
    "# 在创建之后设置Series的name与Series的index的name。\n",
    "s.name = \"Series name\"\n",
    "s.index.name = \"Index Name\"\n",
    "display(s)\n",
    "\n",
    "# 在创建Series的同时，指定name。\n",
    "s2 = pd.Series([1, 2, 3], name=\"Series name\")\n",
    "display(s2)\n",
    "\n",
    "# 创建index对象。可以通过name参数来指定index的名称。\n",
    "index = pd.Index([\"a\", \"b\", \"c\", \"d\"], name=\"index name\")\n",
    "s3 = pd.Series([1, 2, 3, 4], index=index, name=\"Series name\")\n",
    "display(s3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    1\n",
       "2    2\n",
       "3    3\n",
       "4    4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "995    995\n",
       "996    996\n",
       "997    997\n",
       "998    998\n",
       "999    999\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 通过head / tail 访问前（后）N条记录\n",
    "s = pd.Series(range(1000))\n",
    "display(s.head())\n",
    "display(s.tail())\n",
    "# head与tail可以指定显示的条数，默认为5条。\n",
    "s.head(3)\n",
    "\n",
    "# Series必须是一维的数据\n",
    "# s = pd.Series(np.random.rand(3, 5))\n",
    "# display(s)\n",
    "\n",
    "# 不指定Index的时候，直接生成0-n的索引\n",
    "s = pd.Series([1, 2, 3])\n",
    "display(s)\n",
    "display(s[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Series与ndarray数组都可以通过索引访问元素。不同点：\n",
    "# ndarray就是类似与list的索引，支持负数，表示倒数。\n",
    "# Series类似字典的key:value形式的索引。不支持负数。\n",
    "a = np.array([1, 2, 3])\n",
    "display(a[-3])\n",
    "s = pd.Series([1, 2, 3], index=[0, 1, 2])\n",
    "display(s)\n",
    "# s[-3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Series相关操作\n",
    "Series在操作上，与Numpy数据具有如下的相似性：\n",
    "* 支持广播与矢量化运算。\n",
    "* 支持索引与切片。\n",
    "* 支持整数数组与布尔数组提取元素。\n",
    " \n",
    "## 运算\n",
    "Series类型也支持矢量化运算与广播操作。计算规则与Numpy数组的规则相同。同时，Numpy的一些函数，也适用于Series类型，例如，np.mean，np.sum等。  \n",
    "多个Series运算时，会根据索引进行对齐。当索引无法匹配时，结果值为NaN（缺失值）。\n",
    "\n",
    "说明：\n",
    "* 我们可以通过pandas或Series的isnull与notnull来判断数据是否缺失。\n",
    "* 除了运算符以外，我们也可以使用Series对象提供的相关方法进行运算【可以指定缺失的填充值】。\n",
    "* 尽管Numpy的一些函数，也适用于Series类型，但Series与ndarray数组对于空值NaN的计算处理方式上是不同的。【Numpy的计算，会得到NaN，而Series会忽略NaN】"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4\n",
       "1    10\n",
       "2    18\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0     5\n",
       "1    10\n",
       "2    15\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "2.0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s1 = pd.Series([1, 2, 3])\n",
    "s2 = pd.Series([4, 5, 6])\n",
    "display(s1 * s2)\n",
    "display(s1 * 5)\n",
    "\n",
    "# 对于numpy的一些函数，例如mean，sum等，也适用于Series。\n",
    "display(np.mean(s1), np.sum(s1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    NaN\n",
       "2    6.0\n",
       "3    8.0\n",
       "4    NaN\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "1    101.0\n",
       "2      6.0\n",
       "3      8.0\n",
       "4    106.0\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s1 = pd.Series([1, 2, 3], index=[1, 2, 3])\n",
    "s2 = pd.Series([4, 5, 6], index=[2, 3, 4])\n",
    "# Series与ndarray数组计算的不同。Series运行时，会根据标签进行对齐，如果标签无法匹配（对齐），就会产生空值（NaN）。\n",
    "display(s1 + s2)\n",
    "\n",
    "# 如果不想产生空值，则可以使用Series提供的计算方法来代替运算符的计算。\n",
    "display(s1.add(s2, fill_value=100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    2.0\n",
       "2    3.0\n",
       "3    NaN\n",
       "4    NaN\n",
       "dtype: float64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2    False\n",
       "3     True\n",
       "4     True\n",
       "dtype: bool"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1     True\n",
       "2     True\n",
       "3    False\n",
       "4    False\n",
       "dtype: bool"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 判断是否为空值。\n",
    "s = pd.Series([1, 2, 3, float(\"NaN\"), np.nan])\n",
    "display(s)\n",
    "\n",
    "# 判断是否为空值。\n",
    "display(s.isnull())\n",
    "\n",
    "# 判断是否不是空值。\n",
    "display(pd.notnull(s))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "2.5"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# np.mean, np.sum等函数，在处理ndarray数组与Series时，表现的不同。\n",
    "# 对于Series，会忽略掉NaN。\n",
    "a = np.array([1, 2, 3, 4, np.nan])\n",
    "s = pd.Series([1, 2, 3, 4, np.nan])\n",
    "display(np.mean(a))\n",
    "display(np.mean(s))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 索引\n",
    "### 标签索引与位置索引\n",
    "如果Series对象的index值为非数值类型，通过\\[索引\\]访问元素，索引既可以是标签索引，也可以是位置索引。这会在一定程度上造成混淆。我们可以通过：\n",
    "* loc 仅通过标签索引访问。\n",
    "* iloc 仅通过位置索引访问。\n",
    "\n",
    "这样，就可以更加具有针对性去访问元素。\n",
    "\n",
    "### 整数数组索引与布尔数组索引\n",
    "Series也支持使用整数数组与布尔数组进行索引。\n",
    "说明：\n",
    "* 与ndarray数组的整数索引不太相同，Series的整数数组索引，既可以是标签数组索引，也可以是位置数组索引。\n",
    "* 与Numpy数组相同，二者返回的是原数组数据的拷贝（复制）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Series的索引分为标签索引与位置索引。\n",
    "s = pd.Series([1, 2, 3], index=list(\"abc\"))\n",
    "display(s)\n",
    "\n",
    "# 既通过标签索引访问，也可以通过位置索引访问。\n",
    "display(s[\"a\"])\n",
    "display(s[0])\n",
    "\n",
    "# 如果指定的索引是数值类型，则位置索引就失灵。\n",
    "s = pd.Series([1, 2, 3], index=[2, 3, 4])\n",
    "display(s[3])\n",
    "# 出错，因为位置索引不再可用。\n",
    "# display(s[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 为了避免上诉的混淆性，我们可以通过loc与iloc进行更有针对的访问。\n",
    "# loc 专门针对标签进行访问\n",
    "# iloc专门针对位置进行访问\n",
    "s = pd.Series([1, 2, 3], index=list(\"abc\"))\n",
    "display(s.loc[\"a\"])\n",
    "display(s.iloc[0])\n",
    "# 错误\n",
    "# display(s.loc[0])\n",
    "# 错误\n",
    "# display(s.iloc[\"a\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "d    4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "d    4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 通过整数数组（标签数组索引或者位置数组索引）索引访问元素。\n",
    "s = pd.Series([1, 2, 3, 4], index=list(\"abcd\"))\n",
    "display(s.loc[\"a\"])\n",
    "# 通过标签数组索引元素。\n",
    "display(s.loc[[\"a\", \"d\"]])\n",
    "# 通过位置数组索引元素。\n",
    "display(s.iloc[[0, 3]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "d    4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "a    1000\n",
       "d       4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 通过数组索引返回的是原数据的拷贝（彼此之间不受干扰）。【ndarray也是如此】\n",
    "s = pd.Series([1, 2, 3, 4], index=list(\"abcd\"))\n",
    "s2 = s.iloc[[0, 3]]\n",
    "s2[0] = 1000\n",
    "display(s, s2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "c    3\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "a    False\n",
       "b    False\n",
       "c     True\n",
       "d     True\n",
       "dtype: bool"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "c    3\n",
       "d    4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s = pd.Series([1, 2, 3, 4], index=list(\"abcd\"))\n",
    "display(s[[True, False, True, False]])\n",
    "# 实际过程中，布尔数组都是通过计算得出的。\n",
    "b_array = s > 2\n",
    "display(b_array)\n",
    "display(s[s > 2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 切片\n",
    "Series也支持切片访问一个区间的元素。与Numpy的数组相同，切片返回的是原数组数据的视图。\n",
    "## <font color=\"green\">Series的切片与Numpy的ndarray数组对象切片是否存在不同？</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1000\n",
       "1       2\n",
       "2       3\n",
       "3       4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    1000\n",
       "1       2\n",
       "2       3\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Series也支持切片操作。与ndarray相同的是，Series切片返回的也是原数据的视图。\n",
    "s1 = pd.Series([1, 2, 3, 4])\n",
    "s2 = s1[0:3]\n",
    "\n",
    "# 要对s2改变，会影响到以前的s1。\n",
    "s2[0] = 1000\n",
    "display(s1, s2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "d    4\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Series的索引分为标签索引与位置索引，二者在切片的行为上是不一致的。\n",
    "# 通过位置索引切片，不包含末尾的值，通过标签索引切片，包含末尾的值。\n",
    "\n",
    "s = pd.Series([1, 2, 3, 4], index=list(\"abcd\"))\n",
    "# 通过位置索引切片\n",
    "display(s.iloc[0:3])\n",
    "# 通过标签索引切片\n",
    "display(s.loc[\"a\":\"d\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Series的CRUD\n",
    "Series索引-数值CRUD操作：\n",
    "* 获取值\n",
    "* 修改值\n",
    "* 增加索引-值\n",
    "* 删除索引-值\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "a    3000\n",
       "b       2\n",
       "c       3\n",
       "d       4\n",
       "e       5\n",
       "f       6\n",
       "g       7\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "a               3000\n",
       "b                  2\n",
       "c                  3\n",
       "d                  4\n",
       "e                  5\n",
       "f                  6\n",
       "g                  7\n",
       "new_key    new_value\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s = pd.Series([1, 2, 3, 4, 5, 6 ,7], index=list(\"abcdefg\"))\n",
    "# 获取值，通过标签索引或位置索引（或者是二者的数组）\n",
    "display(s.loc[\"a\"])\n",
    "display(s.iloc[0])\n",
    "\n",
    "# 修改值\n",
    "s.loc[\"a\"] = 3000\n",
    "display(s)\n",
    "\n",
    "# 增加值 就可以像字典那样进行操作\n",
    "s[\"new_key\"] = \"new_value\"\n",
    "display(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    2\n",
       "c    3\n",
       "d    4\n",
       "e    5\n",
       "f    6\n",
       "g    7\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "c    3\n",
       "d    4\n",
       "e    5\n",
       "f    6\n",
       "g    7\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "e    5\n",
       "f    6\n",
       "g    7\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "c    3\n",
       "d    4\n",
       "e    5\n",
       "f    6\n",
       "g    7\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "e    5\n",
       "f    6\n",
       "g    7\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "s = pd.Series([1, 2, 3, 4, 5, 6 ,7], index=list(\"abcdefg\"))\n",
    "# 删除值 类似字典的操作\n",
    "del s[\"a\"]\n",
    "display(s)\n",
    "\n",
    "# 删除值，通过drop方法。\n",
    "# inplace，就地修改。如果指定为True，则不会返回修改修改后的结果（返回None）。\n",
    "s.drop(\"b\", inplace=True)\n",
    "display(s)\n",
    "\n",
    "# 可以提供一个标签列表，删除多个值。\n",
    "s1 = s.drop([\"c\", \"d\"])\n",
    "display(s1)\n",
    "display(s)\n",
    "s.drop([\"c\", \"d\"], inplace=True)\n",
    "display(s)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
